303 research outputs found

    The new science of metagenomics and the challenges of its use in both developed and developing countries

    Get PDF
    Our view of the microbial world and its impact on human health is changing radically with the ability to sequence uncultured or unculturable microbes sampled directly from their habitats, ability made possible by fast and cheap next generation sequencing technologies. Such recent developments represents a paradigmatic shift in the analysis of habitat biodiversity, be it the human, soil or ocean microbiome. We review here some research examples and results that indicate the importance of the microbiome in our lives and then discus some of the challenges faced by metagenomic experiments and the subsequent analysis of the generated data. We then analyze the economic and social impact on genomic-medicine and research in both developing and developed countries. We support the idea that there are significant benefits in building capacities for developing high-level scientific research in metagenomics in developing countries. Indeed, the notion that developing countries should wait for developed countries to make advances in science and technology that they later import at great cost has recently been challenged

    Deep Learning for Metagenomic Data: using 2D Embeddings and Convolutional Neural Networks

    Full text link
    Deep learning (DL) techniques have had unprecedented success when applied to images, waveforms, and texts to cite a few. In general, when the sample size (N) is much greater than the number of features (d), DL outperforms previous machine learning (ML) techniques, often through the use of convolution neural networks (CNNs). However, in many bioinformatics ML tasks, we encounter the opposite situation where d is greater than N. In these situations, applying DL techniques (such as feed-forward networks) would lead to severe overfitting. Thus, sparse ML techniques (such as LASSO e.g.) usually yield the best results on these tasks. In this paper, we show how to apply CNNs on data which do not have originally an image structure (in particular on metagenomic data). Our first contribution is to show how to map metagenomic data in a meaningful way to 1D or 2D images. Based on this representation, we then apply a CNN, with the aim of predicting various diseases. The proposed approach is applied on six different datasets including in total over 1000 samples from various diseases. This approach could be a promising one for prediction tasks in the bioinformatics field.Comment: Accepted at NIPS 2017 Workshop on Machine Learning for Health (https://ml4health.github.io/2017/); In Proceedings of the NIPS ML4H 2017 Workshop in Long Beach, CA, USA

    Perceptual Learning and Abstraction in Machine Learning : an application to autonomous robotics

    Get PDF
    This paper deals with the possible benefits of Perceptual Learning in Artificial Intelligence. On the one hand, Perceptual Learning is more and more studied in neurobiology and is now considered as an essential part of any living system. In fact, Perceptual Learning and Cognitive Learning are both necessary for learning and often depends on each other. On the other hand, many works in Machine Learning are concerned with "Abstraction" in order to reduce the amount of complexity related to some learning tasks. In the Abstraction framework, Perceptual Learning can be seen as a specific process that learns how to transform the data before the traditional learning task itself takes place. In this paper, we argue that biologically-inspired Perceptual Learning mechanisms could be used to build efficient low-level Abstraction operators that deal with real world data. To illustrate this, we present an application where perceptual learning inspired meta-operators are used to perform an abstraction on an autonomous robot visual perception. The goal of this work is to enable the robot to learn how to identify objects it encounters in its environment

    Rounding Methods for Discrete Linear Classification (Extended Version)

    Get PDF
    Learning discrete linear classifiers is known as a difficult challenge. In this paper, this learning task is cast as combinatorial optimization problem: given a training sample formed by positive and negative feature vectors in the Euclidean space, the goal is to find a discrete linear function that minimizes the cumulative hinge loss of the sample. Since this problem is NP-hard, we examine two simple rounding algorithms that discretize the fractional solution of the problem. Generalization bounds are derived for several classes of binary-weighted linear functions, by analyzing the Rademacher complexity of these classes and by establishing approximation bounds for our rounding algorithms. Our methods are evaluated on both synthetic and real-world data

    DOMAIN ABSTRACTION OF HIGHLY CORRELATED PAIRS TO RECOMMEND IN THE LONG TAIL

    Get PDF
    ABSTRACTAmong difficulties encountered by modern shopping recommenders is the long tail shape of sold items also related to cold-start issues. Various approaches including content-based recommendations attempt to overcome this problem that has serious impact on the accuracy of recommendations especially when new products are continuously added to the catalogue. This paper investigates the use of an algorithm to search for highly correlated pairs between abstractions of items. The advantage of this approach is evaluated on the basis of real data showing better results compared to an approach onlybased on the concrete pairs of items. Using rigorous protocols such as Given-n, experimental results show significant improvement in both the recommendation accuracy and the recommendation of products in the long tail.Keywords. Knowledge Discovery, Mining Correlated Pairs, Recommender Systems

    Qluster: An easy-to-implement generic workflow for robust clustering of health data

    Get PDF
    The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors

    Use of the C4.5 machine learning algorithm to test a clinical guideline-based decision support system.

    No full text
    International audienceWell-designed medical decision support system (DSS) have been shown to improve health care quality. However, before they can be used in real clinical situations, these systems must be extensively tested, to ensure that they conform to the clinical guidelines (CG) on which they are based. Existing methods cannot be used for the systematic testing of all possible test cases. We describe here a new exhaustive dynamic verification method. In this method, the DSS is considered to be a black box, and the Quinlan C4.5 algorithm is used to build a decision tree from an exhaustive set of DSS input vectors and outputs. This method was successfully used for the testing of a medical DSS relating to chronic diseases: the ASTI critiquing module for type 2 diabetes

    GAMA: A Spatially Explicit, Multi-level, Agent-Based Modeling and Simulation Platform

    Get PDF
    International audienceAgent-based modeling is now widely used to investigate complex systems but still lacks integrated and generic tools to support the representation of features usually associated with real complex systems, namely rich, dynamic and realistic environments or multiple levels of agency. The GAMA platform has been developed to address such issues and allow modelers, thanks to the use of a high-level modeling language, to build, couple and reuse complex models combining various agent architectures, environment representations and levels of abstraction

    GAMA: multi-level and complex environment for agent-based models and simulations (demonstration)

    Get PDF
    International audienceAgent-based models are now used in numerous application domains (ecology, social sciences, etc.) but their use is still impeded by the lack of generic yet ready-to-use tools sup- porting the design and the simulation of complex models in- tegrating multiple level of agency and realistic environments. The GAMA modeling and simulation platform is proposed to address such issues. It allows modelers to build com- plex models thanks to high-level modeling language, various agent architectures and advanced environment representa- tions and built-in multi-level support
    • …
    corecore